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Abstract 

In this paper we propose efficient algorithms for solving constrained online convex optimiza- 
tion problems. Our motivation stems from the observation that most algorithms proposed 
for online convex optimization require a projection onto the convex set /C from which the 
decisions are made. While the projection is straightforward for simple shapes (e.g., Eu- 
clidean ball), for arbitrary complex sets it is the main computational challenge and may be 
inefficient in practice. In this paper, we consider an alternative online convex optimization 
problem. Instead of requiring that decisions belong to K, for all rounds, we only require that 
the constraints, which define the set /C. be satisfied in the long run. By turning the prob- 
lem into an online convex-concave optimization problem, we propose an efficient algorithm 
which achieves 0{VT) regret bound and 0{T^^^) bound on the violation of constraints. 
Then, we modify the algorithm in order to guarantee that the constraints are satisfied in 
the long run. This gain is achieved at the price of g etting 0(T^ / ^) reg ret bound. Our sec- 
ond algorithm is based on the mirror prox method (jNemirovski . to solve variational 
inequalities which achieves 0{T'^^'^) bound for both regret and the violation of constraints 
when the domain K. can be described by a finite number of linear constraints. Finally, we 
extend the results to the setting where we only have partial access to the convex set K, and 
propose a multipoint bandit feedback algorithm with the same bounds in expectation as 
our first algorithm. 

Keywords: online convex optimization, convex-concave optimization, bandit feedback, 
variational inequality 

1. Introduction 

Online convex optimization has recently emerged as a primitive frar nework for designing effi- 



cient algorithms for a wide variety of machine learning applications (ICesa-Bianchi and Lugosi 



2006l i. In general, an online convex optimization problem can be formulated as a repeated 



game between a learner and an adversary: at each iteration t, the learner first presents 
a solution € /C, where /C C R"' is a convex domain representing the solution space; it 
then receives a convex function /t(x) : fC i— )• and suffers the loss ft{'^t) for the sub- 
mitted solution xj. The objective of the learner is to generate a sequence of solutions 
xtS/C,t = l,2,-- - ,T that minimizes the regret ^inp defined as 
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5Ht = 5; A(xt)- mm 5; /i(x). 



t=i 



(1) 



i=l 



Regret measures the difference between the cumulative loss of the learner's strategy and the 
minimum possible loss had the sequence of loss functions been known in advance and the 
learner could choose the best fixed action in hindsight. When is sub-linear in the number 
of ro unds, that is, o{T), we call the solution Hannan consistent (|Cesa-Bianchi and Lugosil . 

implying that the learner's average per-round loss approaches the average per-round 
loss of the best fixed action in hindsight. It is noticeable that the performance bound 
must hold for any sequence of loss functions, and in particular if the sequence is chosen 
adversarially. 

Many successful algorithms have been developed over the past decade to minimize the re- 
gret in the onliii e convex optimization. The problem was initiated in the remarkable work of 
ZinkevichI ^2mi ) which presents an algorithm based on gradient descent with projection that 
guarantees a regret of 0{VT) when t he set /C is convex a nd th e loss functions are Lipschitz 
contin uous within the domain /C. In Hazan et al. ( 200?! ) and Shalev-Shwartz and Kakade 
( 20081 ) algorithms with logarithmic regret bound were propo sed for strongly convex loss 
functions. In particular, the algorithm in Hazan et al.l ( 2007 ) is based on online Newton 
step and covers the general class of exp-concave loss functions. Notably, the simple gradient 
based algorithm also achieves an Oflo gT) regret bound fo r strongly convex loss functions 
with an appropriately chosen step size. iBartlett et al.l (120071 ) generalizes the results in previ- 
ous works to the setting where the algorithm can adapt to the curvature of the loss functions 
without any prior information. A moder n view of these algo rithms casts the p r oblem as the 
task of following the regularized leader (Rakhlin, 20091 ). In lAbernethv et al. ( 20091 ). using 
game-theoretic analysis, it has been shown that both 0{VT) for Lipschitz continuous and 
O(logT) for strongly convex loss functions are tight in the minimax sense. 

Examining the existing algorithms, most of the techniques usually require a projection 
step at each iteration in order to get back to the feasible region. For the performance of these 
online algorithms, the computational cost of the projection step is of crucial importance. 
To motivate the setting addressed in this paper, let us first examine a popular online 
learning algorithm f or mi nimizing the regret based on the online gradient descent (OGD) 
method (IZinkevichl . l2003l ) . At each iteration t, after receiving the convex function /t(x), the 
learner computes the gradient V/t(xi) and updates the solution by solving the following 
optimization problem 



arg mm 



|x - xt r]\7ft{-xt] 



(2) 



where Hici') denotes the projection onto /C and 77 > is a predefined step size. Despite 
the simplicity of the OGD algorithm, the computational cost per iteration is crucial for its 
applicability. For general convex domains, solving the optimization problem in ([2]) is an 
offline convex optimization problem by itself and can be computationally expensive. For 
example, when one envisions a positive semidefinitive cone in applications such as distance 
metric learning and matrix completion, the full eigen-decomposition of a matrix is required 
to project the updated solutions back into the cone. Recently several efficient algorithms 
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have been developed for projection onto specific domains, for example, ii ball (jPuchi et al 



20081 : iLiu and yI boogl l ; however, when the domain K, is complex, the projection step is a 



more involved task or computationally burdensome. 

To tackle the computational challenge arising from the projection step, we consider 
an alternative online learning problem. Instead of requiring G /C, we only require the 
constraints, which define the convex domain /C, to be satisfied in a long run. Then, the online 
learning problem becomes a task to find a sequence of solutions Xf,t ^ ^] that minimizes 
the regret defined in ([1]), under the long term constraints, that is, X^t=iXt/T S K,. We 
refer to this problem as online learning with long term constraints. In other words, 
instead of solving the projection problem in ([2]) on each round, we allow the learner to make 
decisions at some iterations which do not belong to the set /C, but the overall sequence of 
chosen decisions must obey the constraints at the end by a vanishing convergence rate. 

From a different perspective, the proposed online optimization with long term con- 
straints setup is reminiscent of r egret minimization wi t h side constraints or constrained 
regret minimization addressed in Mannor and Tsitsikhd (|2006l ). motivated by applications 
in wireless communication. In regret minimization with side constraints, beyond minimiz- 
ing regret, the learner has some side constraints that need to be satisfied on average for 
all rounds. Unlike our setting, in learning with side constraints, the set tC is controlled by 
adversary and can vary arbitrarily from trial to trial. It has been shown that if the convex 
set is affected by bot h decisions and loss functions, the minimax optimal regret is generally 
unattainable online ( Mannor et al. . 20091 ). 



One interesting application of the constrained regret minimization is multi-objective 
online classification where the learner aims at simultaneously optimizing more than one 
classific ation performance crit eria. In the simple two objective online classification consid- 
ered in iBernstein et all (|2O10l l. the goal of the online classifier is to maximize the average 
true positive classification rate with an additional performance guarantee in terms of the 
false positive rate. Following the Neyman-Pearson risk, the intuitive approach to tackle 
this problem is to optimize one criterion (i.e., maximizing the true positive rate) subject 
to explicit constraint on the other objective (i.e., false positive rate) that needs to be sat- 
isfied on average for th e sequence of decisioii s. The constrained regret matching (CRM) 
algorithm, proposed in ISernstein et al.l (!2010l ). efficiently solves this problem by relaxing 
the objective under mild assumptions on the single-stage constraint. The main idea of the 
CRM algorithm is to incorporate the penalty, that should be paid by the learner to satisfy 
the constraint, in the objective (i.e., true positive rate) by subtracting a positive constant at 
each decision step. It has been shown that the CRM algorithm asymptotically satisfies the 
average constraint (i.e., false positive rate) provided that the relaxation constant is above 
a certain threshold. 

Finally, it is worth mentioning that the proposed setting can be used in ce rtain classes 
of online learning such as online-to-batch conversion (jCesa-Bianchi et ahLliooi ). where it is 
sufficient to guarantee that constraints are satisfied in the long run. More specifically, under 
the assumption that received examples are i.i.d samples, the solution for batch learning is 
to average the solutions obtained over all the trials. As a result, if the long term constraint 
is satisfied, it is guaranteed that the average solution will belong to the domain /C. 

In this paper, we describe and analyze a general framework for solving online convex 
optimization with long term constraints. We first show that a direct application of OGD 
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fails to achieve a sub-linear bound on the violation of constraints and an 0{VT) bound 
on the regret. Then, by turning the problem into an online convex-concave optimization 
problem, we propose an efficient algorithm which is an adaption of OGD for online learning 
with long term constraints. The proposed algorithm achieves the same 0{VT) regret bound 
as the general setting and 0{T^^^) bound for the violation of constraints. We show that 
by using a simple trick we can turn the proposed method into an algorithm which exactly 
satisfies the constraints in the long run by achieving 0(T^/^) regret bound. When the 
convex domain /C can be described by a finite number o f linear constraint s, we propose an 
alternative algorithm based on the mirror prox method ( Nemirovski . 20051 ). which achieves 



0(T^/^) bound for both regret and the violation of constraints. Our framework also handles 
the cases when we do not have full access to the domain /C except through a limited number 
of oracle evaluations. In the full-information version, the decision maker can observe the 
entire convex domain fC, whereas in a partial-information (a.k.a bandit setting) the decision 
maker may only observe the cost of the constraints defining the domain /C at limited points. 
We show that we can generalize the proposed OGD based algorithm to this setting by only 
accessing the value oracle for domain /C at two points, which achieves the same bounds in 
expectation as the case that has a full knowledge about the domain /C. In summary, the 
present work makes the following contributions: 

• A general theorem that shows, in online setting, a simple penalty based method attains 
linear bound 0{T) for either the regret or the long term violation of the constraints 
and fails to achieve sub-linear bound for both regret and the long term violation of 
the constraints at the same time. 

• A convex-concave formulation of online convex optimization with long term con- 
straints, and an efficient algorithm based on OGD that attains a regret bound of 
0(T-^/^), and 0{T^^^) violation of the constraints. 

• A modified OGD based algorithm for online convex optimization with long term con- 
straints that has no constraint violation but 0{T^^^) regret bound. 

• An algorithm for online convex optimization with long term constraints based on the 
mirror prox method that achieves 0(T^/^) regret and constraint violation. 

• A multipoint bandit version of the basic algorithm with 0(T^/^) regret bound and 
0(T^/^) violation of the constraints in expectation by accessing the value oracle for 
the convex set /C at two points. 

The remainder of the paper is structured as follows: In Section [3l we first examine a simple 
penalty based strategy and show that it fails to attain sub-linear bound for both regret 
and long term violation of the constraints. Then, we formulate regret minimization as 
an online convex-concave optimization problem and apply the OGD algorithm to solve it. 
Our first algorithm allows the constraints to be violated in a controlled way. It is then 
modified to have the constraints exactly satisfied in the long run. Section |4] presents our 
second algorithm which is an adaptation of the mirror prox method. Section [5] generalizes 
the online convex optimization with long term constraints problem to the setting where we 
only have a partial access to the convex domain /C. Section [6] concludes the work with a 
list of open questions. 
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2. Notation and Setting 

Before proceeding, we define the notations used throughout the paper and state the assump- 
tions made for the analysis of algorithms. Vectors are shown by lower case bold letters, such 
as X G W^. Matrices are indicated by upper case letters such as A and their pseudoinverse is 
represented by . We use [m] as a shorthand for the set of integers {1,2,..., m}. Through- 
out the paper we denote by || • || and || • ||i the £2 (Euclidean) norm and ^i-norm, respectively. 
We use E and to denote the expectation and conditional expectation with respect to all 
randomness in early t — 1 trials, respectively. To facilitate our analysis, we assume that the 
domain /C can be written as an intersection of a finite number of convex constraints, that 
is, 

/C = {x G M'^ : £fi(x) <0,i£ [m]}, 
where gi{-),i G [m], are Lip schitz continuous furi ctions. Like many other works for online 



convex optimization such as lFlaxman et al.l (120051 ). we assume that /C is a bounded domain. 



that is, there exist constants -R > and r < 1 such that /C C KB and rB C /C where B 
denotes the unit £2 ball centered at the origin. For the ease of notation, we use B = RE. 

We focus on the problem of online convex optimization, in which the goal is to achieve 
a low regret with respect to a fixed decision on a sequence of loss functions. The difference 
between the setting considered here and the general online convex optimization is that, in 
our setting, instead of requiring x^ G /C, or equivalently gi{xt) < 0, i G [m], for all t G [T], we 
only require the constraints to be satisfied in the long run, namely Ylt=i 9i{^t) < 0, i G [m]. 
Then, the problem becomes to find a sequence of solutions xt,t G [T] that minimizes the 
regret defined in ([T]), under the long term constraints Ylt=i9ii^t) < 0,i G [m]. Formally, 
we would like to solve the following optimization problem online, 

T T T 

™™ „y]/i(^t) -miny]/t(x) s.t. ^g^{^t) < , i e [m]. (3) 
t=l t=l t=l 

For simplicity, we will focus on a finite-horizon setting where the number of rounds T is 
known in advance. T his condition can be relax e d und er certain conditions, using standard 
techniques (see, e.g., Cesa-Bianchi and Lugosil . 20061 ). Note that in ([3]), (i) the solutions 



come from the ball B D IC instead of /C and (ii) the constraint functions are fixed and are 
given in advance. 

Like most online learning algorithms, we assume that both loss functions and the con- 
straint functions are Lipschitz continuous, that is, there exists constants Lf and Lg such 
that |/t(x) - ftix')\ < L/||x - x'll, \giix) - 5fj(x')| < Lg\\x - x'|| for any x G and x' G 
B ,i £ [m]. For simplicity of analysis, we use G = max{Lj,Lg} and 

F = max max /t(x) — ft(x.') < 2LfR, 

t6[T] x,x'6K: 

D = maxmaxflj(x) < LgR. 

i£[m] xeB ^ 

Finally, we define the notion of a Bregman divergence. Let </){■) be a strictly convex 
function defined on a convex set /C. The Bregman divergence between x and x' is defined 
as B^(x,x') = (^(x) — (^(x') — (x — x')~''V(?l)(x') which measures how much the function (/)(■) 
deviates at x from it's linear approximation at x'. 
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3. Online Convex Optimization with Long Term Constraints 

In this section we present and analyze our gradient descent based algorithms for online 
convex optimization problem with long term constraints. We first describe an algorithm 
which is allowed to violate the constraints and then, by applying a simple trick, we propose 
a variant of the first algorithm which exactly satisfies the constraints in the long run. 

Before we state our formulation and algorithms, let us review a few alternative tech- 
niques that do not need explicit projection. A straightforward approach is to introduce 
an appropriate self-concordant barrier function for the given convex set fC and add it to 
the objective function such that the barrier diverges at the boundary of the set. Then we 
can interpret the resulting optimization problem, on the modified objective functions, as 
an unconstrained mini mization problern that can be solved without projection steps. Fol- 
lowing the analysis in lAbernethv et al.l (I2OI2I ). with an appropriately designed procedure 



for updating solutions, we could guaran tee a regret bound of O f-y/T) without the violation 
of con straints. A similar idea is used in Abernethy et al. (j2008l ) for online bandit learning 



and in lNaravanan and Rakhlin for a random walk approach for regret minimization 



which, in fact, translates the issue of projection into the difficulty of sampling. Even for 
linear Lipschitz cost functions, the random walk approach requires sampling from a Gaus- 
sian distribution with covariance given by the Hessian of the self-concordant barrier of the 
convex set fC that has the same time complexity as inverting a matrix. The main limitation 
with these approaches is that they require computing the Hessian matrix of the objective 
function in order to guarantee that the updated solution stays within the given domain /C. 
This limitation makes it computationally unattractive when dealing with high dimensional 
data. In addition, except for well known cases, it is often unclear how to efficiently construct 
a self-concordant barrier function for a general convex domain. 

An alternative approach for online convex optimization with long term constraints is 
to introduce a penalty term in the loss function that penalizes the violation of constraints. 
More specifically, we can define a new loss function /j(-) as 

m 

A(x) = /i(x) + ,55^[ffi(x)]+, (4) 

i=l 

where = max(0, 1 — z) and (5 > is a fixed positive constant used to penalize the 
violation of constraints. We then run the standard OGD algorithm to minimize the modified 
loss function /t(-). The following theorem shows that this simple strategy fails to achieve 
sub-linear bound for both regret and the long term violation of constraints at the same 
time. 



Theorem 1 Given 5 > 0, there always exists a sequence of loss functions {/j(x)}^;^ and 

^<y(x)<o 



constraint function g{'x.) such that either X^^i fti'^t) — niiiig(x)<o X]t=i /t(^) = 0{T) or 



Z^j=i[fl'(^t)]+ = 0{T) holds, where {'ii-t\J=i is the sequence of solutions generated by the 
OGD algorithm that minimizes the modified loss functions given in 

We defer the proof to Appendix[X] along with a simple analysis of the OGD when applied 
to the modified functions in dH). The analysis shows that in order to obtain 0{VT) regret 
bound, linear bound on the long term violation of the constraints is unavoidable. The main 
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reason for the failure of using modified loss function in ^ is that the weight constant 5 is 
fixed and independent from the sequence of solutions obtained so far. In the next subsection, 
we present an online convex-concave formulation for online convex optimization with long 
term constraints, which explicitly addresses the limitation of Q by automatically adjusting 
the weight constant based on the violation of the solutions obtained so far. 

As mentioned before, our general strategy is to turn online convex optimization with 
long term constraints into a convex-concave optimization problem. Instead of generating 
a sequence of solutions that satisfies the long term constraints, we first consider an online 
optimization strategy that allows the violation of constraints on some rounds in a controlled 
way. We then modify the online optimization strategy to obtain a sequence of solutions 
that obeys the long term constraints. Although the online convex optimization with long 
term constraints is clearly easier than the standard online convex optimization problem, it 
is straightforward to see that optimal regret bound for online optimization with long term 
constraints should be on the order of 0{VT), no better than the standard online convex 
optimization problem. 

3.1 An Efficient Algorithm with 0{Vf) Regret Bound and 0{T^/'^) Bound on 
the Violation of Constraints 

The intuition behind our approach stems from the observation that the constrained op- 
timization problem miUxeA: X^^i is equivalent to the following convex-concave opti- 
mization problem 



J2 (^) + 5^ Aift (x) , (5) 

+ t=l i=l 

where A = (Ai, . . . ,Xm)~^ is the vector of Lagrangian multipliers associated with the con- 
straints gi{-),i = 1, . . . ,m and belongs to the nonnegative orthant W^. To solve the online 
convex-con cave optimization p roblem, we extend the gradient based approach for variational 



inequality (iNemirovskil . Il994l ) to ([5]). To this end, we consider the following regularized 



convex-concave function as 

£,(x. A) = /i(x) + 5^ |A.ft(x) - yAf I , (6) 

where 5 > is a constant whose value will be decided by the analysis. Note that in ([6]), we 
introduce a regularizer 6i]X'f/2 to prevent Aj from being too large. This is because, when Aj 
is large, we may encounter a large gradient for x because of Vx>Ct(x, A) tx Xll^i XiVgi{x), 
leading to unstable solutions and a poor regret bound. Although we can achieve the same 
goal by restricting Aj to a bounded domain, using the quadratic regularizer makes it con- 
venient for our analysis. 

Algorithm [1] shows the detailed steps of the proposed algorithm. Unlike standard online 
convex optimization algorithms that only update x. Algorithm [T] updates both x and A. In 
addition, unlike the modified loss function in ^ where the weights for constraints {gi{x) < 
0}^^ are fixed. Algorithm [J automatically adjusts the weights {Xi}'^^ based on {gi{^)}^i, 
the violation of constraints, as the game proceeds. It is this property that allows Algorithm[T] 
to achieve sub-linear bound for both regret and the violation of constraints. 
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Algorithm 1 Gradient based Online Convex Optimization with Long Term Constraints 
Input: constraints gi{x.) < 0,i € [m], step size rj, and constant 5 > 
Initiahzation: xi = and Ai = 
for t = 1,2, . . . ,r do 
Submit solution 

Receive the convex function /t(x) and experience loss /t(xf) 

Compute VxA(xt, At) = V/t(xt) + ^^^ XlVgi{xt) and S/ x,Ct{xt, Xt) = gi{xt)-r]6Xl 



Update Xf and A^ by 



xt+i = IIb (xi - r?Vx>Ct(xi, Xt)) 



8: end for 



To analyze Algorithm [H we first state the following lemma, the key to the main theorem 
on the regret bound and the violation of constraints. 

Lemma 2 Let Ct{-,-) be the function defined in ^ which is convex in its first argument 
and concave in its second argument. Then for any (x, X) £ B x WJ!: we have 



A(xt, A) - Ct{x, Xt) < — (||x - Xt f + ||A - Xtf - ||x - xt+if - ||A - A* 



1 

(||x - xtip + ||A - Atlp - ||x - xt+i|p - ||A - Xt+l\ 

+ ^{\\vM^t,Xt)f + \\^xCt{^t,Xt 



Proof Following the analysis of Zinkevich ( 20031 ). convexity of Ct{-, X) implies that 

Cti^t, Xt) - Ct{-K, Xt) < (xt - x)^VxA(xt, Xt) (7) 

and by concavity of -Ct(x, •) we have 

A(xt, A) - A(xt, Xt) < (A - At)^VAA(xt, Xt). (8) 

Combining the inequalities ([7]) and ([H]) results in 

A(xt, A) - A(x, At) < (xt - x)TVx/:t(xt, At) - (A - At)^VA£t(xt, At). (9) 

Using the update rule for xt+i in terms of xt and expanding, we get 

||x - xt+if < ||x - xtf - 2r?(xt - x)^Vx/:t(xt, At) + 7?2||Vx/:t(xt, At)f , (10) 

where the first inequality follows from the nonexpansive property of the projection opera- 
tion. Expanding the inequality for ||A — At+ip in terms of At and plugging back into the 
([9]) with (fTO|) establishes the desired inequality. ■ 
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Proposition 3 Let xt and Xt,t G [T] be the sequence of solutions obtained by AlgorithmUl 
Then for any x G B and A € W^, we have 

T 

J]A(xt,A)-A(x,At) (11) 



t=i 

i?^ + ||A|p ^ , iN^2 , 7->2^ , ^ , -i\^2 , n... r2„2\ II \ ||2 



Proof We first bound the gradient terms in the right hand side of Lemma [2l Using 
the inequality (ai + 02 + . . . , a„)^ < n(af + 02 + • • • + a^), we have ||Vx>Cj(xt, At)|p < 
(m + l)G2 (1 + ||Atf ) and \\V x^ti^t, >^tW < 2m{D^ + 6^ri'^\\Xtf). InLemmaEl by adding 
the inequahties of all iterations, and using the fact ||x|| < R we complete the proof. ■ 

The following theorem bounds the regret and the violation of the constraints in the long 
run for Algorithm [TJ 

Theorem 4 Define a = RyJ{m + l)G'^ + 2mD'^ . Set r] = R'^/[aVT]. Assume T is large 
enough such that 2^/2r]{m + 1) < 1. Choose 6 such that 6 > {m + 1)G^ + 2m5'^if' . Let 
xt,t € [T] be the sequence of solutions obtained by AlgorithmUl Then for the optimal 
solution x^, = minxejc Ylt=i /t(x) we have 



J2 M^t) - ft{^*) <aVT = 0{T'/^), and 



t=l 
T 



Proof We begin by expanding (jlip using ([6]) and rearranging the terms to get 
Y ifti^t) - /t(x)] + Y A,, E5.(xt) - Y ^t5.(x) - 

t=l i=l I t=l t=l ) 

t=l ' 



+ |((m + l)G2 + 2m5V)El|At|'2 

t=i 



Since 6 > (m + 1)G^ + 2m5'^rj^ , we can drop the ||At|p terms from both sides of the above 
inequality and obtain 

Y [/.(xO - /,(x)] + ± |a.X:..(x.) - + I) Af } 

< EE^*5.(x) + |- + ^((m + l)G2 + 2mZ^2))_ 

i=l i=l ^ 
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The left hand side of above inequaUty consists of two terms. The first term basically mea- 
sures the difference between the cumulative loss of the Algorithm [1] and the optimal solution 
and the second term includes the constraint functions with corresponding Lagrangian mul- 
tipliers which will be used to bound the long term violation of the constraints. By taking 
maximization for A over the range (0,+oo), we get 



5;[/*(xO-/i(x)]+X^ 



[ELifi'i(xt) 



t=l 



1=1 



2{6rjT + m/rj) 



t=l 



Since x,, G /C, we have ^^(x*) <0,i£ [m], and the resulting inequality becomes 



j;/,(x,)-/,(x,) + j; 



[Er=i5i(xt) 



t=i 



^ 2{6riT + m/r]) - 2r] 2 ^ " 



The statement of the first part of the theorem follows by using the expression for rj. The 
second part is proved by substituting the regret bound by its lower bound as Yld=i fti'^t) — 
/t(x*) > -FT. U 



Remark 5 We observe that the introduction of quadratic regularizer 6r]\\X\\'^ /2 allows us to 

2 



turn the expression Aj X^^^x dii'^t) into X^^^x dii'^t) , leading to the bound for the violation 

of the constraints. In addition, the quadratic regularizer defined in terms of X allows us 
to work with unbounded X because it cancels the contribution of the \\Xt\\ terms from the 
loss function and the bound on the gradients ||VxA(x, A)||. Note that the constraint for 6 
mentioned in Theorem is equivalent to 



2 ^ ^ ^ l/(m + l) + V(m + l)-2-8GV 

l/(m + 1) + v^(m + l)-2 - 8GW ~ ~ V ' I J 

from which, when T is large enough (i.e., rj is small enough), we can simply set 6 = 
2(m + 1)G'^ that will obey the constraint in 

By investigating Lemma O it turns out that the boundedness of the gradients is essential to 
obtain bounds for Algorithm [1] in Theorem [H Although, at each iteration, is projected 
onto the W^, since /C is a compact set and functions /t(x) and gi{x),i £ [m] are convex , 



the b oundedness of the functions implies that the gradients are bounded (jBertsekas et al 



20031, Proposition 4.2.3). 



3.2 An Efficient Algorithm with 0(T^/^) Regret Bound and without Violation 
of Constraints 

In this subsection we generalize Algorithm [1] such that the constrained are satisfied in 
a long run. To create a sequence of solutions {xt,t G [T]} that satisfies the long term 
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constraints Ylt=i9ii^t) < 0,i G [m], we make two modifications to Algorithm [TJ First, 
instead of handling all of the m constraints, we consider a single constraint defined as 
g(x) = maxjg[^] 9i{^)- Apparently, by achieving zero violation for the constraint g{x) < 0, 
it is guaranteed that all of the constraints gi{-),i G [m] are also satisfied in the long term. 
Furthermore, we change Algorithm [T] by modifying the definition of •) as 

A(x, A) = /i(x) + A(5(x) + 7) - ^A2, (13) 

where 7 > will be decided later. This modification is equivalent to considering the 
constraint ^(x) < —7, a tighter constraint than g(x) < 0. The main idea behind this 
modification is that by using a tighter constraint in our algorithm, the resulting sequence 
of solutions will satisfy the long term constraint Ylt=i di^t) < 0, even though the tighter 
constraint is violated in many trials. 

Before proceeding, we state a fact about the Lipschitz continuity of the function ^(x) 
in the following proposition. 

Proposition 6 Assume that functions gi{-),i G [m] are Lipschitz continuous with constant 
G. Then, function g{x) = maxjg[^] (7j(x) is Lipschitz continuous with constant G, that is, 

|g(x) — g(x')| < G||x — x'll for anyx £ B andx G B. 

Proof First, we rewrite ^(x) = maxjg[m] gi{x) as g{x) = maXaeA,,, Zli^i ondii^) where Am 
is the m-simplex, that is, = {a G YliLi c^i = !}• Then, we have 

|5-(x) - 5r(x')| 



where the last inequality follows from the Lipschitz continuity of gi (x) , i G [m] . ■ 

To obtain a zero bound on the violation of constraints in the long run, we make the following 
assumption about the constraint function ^(x). 

Assumption 1 Let /C' C /C 6e the convex set defined as /C' = {x G M*^ : g{x) + 7 < 0} 
where 7 > 0. We assume that the norm of the gradient of the constraint function g{x) is 
lower bounded at the boundary of IC' , that is, 

min ||Vg(x)|| > a. 
<;(x)+7=0 

A direct consequence of Assumption 1 is that by reducing the domain IC to IC' , the 
optimal value of the constrained optimization problem miUxgA: /(x) does not change much, 
as revealed by the following theorem. 



max y^aigiix) - max y^Oigiix] 



< max 



i=l 
m 



^ Qi5i(x) - ^ aigi{x!) 



i=l 
m 



i=l 



< max \gi{x) - gi{x')\ < G\\x - x'||. 
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Theorem 7 Let and be the optimal solutions to the constrained optimization prob- 
lems defined as ming(x)<o /(x) '^'^^ ™iiig(x)<-7 /(x), respectively, where /(x) = ^21=1 /t(^) 
anc? 7 > 0. We have 

|/(x,)-/(x,)|<-7r. 

Proof We note that the optimization problem ming(x)<-7 /(x) = ming(x)<-7 Ylt=i ft{'^): 
can also be written in the minimax form as 



T 

/(x^) = min max ^ /t(x) + X{g{^) + 7), (14) 



t=i 



where we use the fact that /C' C /C C ^S. We denote by x^ and A-y the optimal solutions to 
(fn]) . We have 



/(x^) = min max V /j(x) + A(5-(x) + 7) 

T 

= min V /t(x) + A^(5-(x) + 7) 

T T 



t=i t=i 



where the second equality follows the definition of the x^ and the last inequality is due to 
the optimality of x*, that is, ^(x*) < 0. 

To bound |/(x^) — /(x*)|, we need to bound A-y. Since x^ is the minimizer of ()14p . from 
the optimality condition we have 



T 



^V/t(x^) = A^V5(x^). (15) 



t=i 



By setting v = — y^ ?l^ V/^fx^), we can simp l ifv ([15]) as X^Vg{'x.y) = v. From the KKT 
optimality condition (jBovd and Vandenberghel . l2004l ). if g{x^) + ^ < then we have A^ = 0; 
otherwise according to Assumption [1] we can bound A^ by 

||v|| GT 
- ||V5(x,)|| - — • 

We complete the proof by applying the fact /(x*) < /(x^) < /(x*) + A^7. ■ 

As indicated by Theorem [71 when 7 is small, we expect the difference between two optimal 
values /(x*) and /(x^) to be small. Using the result from Theorem [3 in the following 
theorem, we show that by running Algorithm [1] on the modified convex-concave functions 
defined in ()13p . we are able to obtain an 0(T^/^) regret bound and zero bound on the 
violation of constraints in the long run. 
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Theorem 8, Set a = 2R/ y'lG'^ + 3{D^ + b'^), i] = /[aVT], and 5 = AG'^ . Let xt,t G 
[T] he the sequence of solutions obtained by Algorithm [7] with functions defined in M3^) 
with 7 = bT~^/'^ and b = 2^ F{5R^a~'^ + aR~'^). Let x,, be the optimal solution to 
minxeA: Xl^i With sufficiently large T, that is, FT > a\fT , and under Assump- 
tionUl we have xt,t G [T] satisfy the global constraint Ylt=i 9{^t) < and the regret IHr is 
bounded by 

T , 
Proof Let x-y be the optimal sokition to iiiing(x)<-7 Sj=i Similar to the proof of 



Theorem m when applied to fmictions in ()13p we have 

^/i(x,)-^/,(x) + A^(5(x,) + 7)- (e^*) (5(x)+7) 
t=i t=i t=l \t=i J 



2 ^ 2r7 _ 

i=l ' t=l 

By setting 5 > 2G^ + ?>5'^rf' which is satisfied hy 5 = 4G^, we cancel the terms including 
from the right hand side of above inequality. By maximizing for A over the range (0, +oo) 
and noting that 7 < 6, for the optimal solution x^, we have 

which, by optimizing for r/ and applying the lower bound for the regret as 'Y^=i fti'^t) — 
/t(x^) > —FT, yields the following inequalities 

T 

E/t(xi)-/j(x^)<aVr (16) 
t=i 

and 



iZai^t) <^2[fT + aVf) ^[^ + ^)- IT, (17) 

for the regret and the violation of the constraint, respectively. Combining (jl6p with the 
result of Theorem [7] results in Ylt=i fti^"/) — Ylt=i ft{^*) + aVT + {G/a)^T. By choosing 
7 = bT~^^^ we attain the desired regret bound as 

X:/*(xt) - /,(x.) < aVT+^-^Ty^ = 0{T^/% 



a 

t=i 



To obtain the bound on the violation of constraints, we note that in ()17p . when T is suffi- 
ciently large, that is, FT > ay/T, we have Y^^i di^t) < 2^J F{5R'^a'~^ + aR~'^)T^/'^ -bT^/"^ . 
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Choosing 6 = 2 \jF{5E?a ^ + aR 2)2^3/4 guarantees the zero bound on the violation of con- 
straints as claimed. ■ 



4. A Mirror Prox Based Approach 

The bound for the violation of constraints for Algorithm [T] is unsatisfactory since it is signif- 
icantly worse than 0{^/T) . In this sec t ion, w e pursue a different approach that is based on 
the mirror prox method in iNemirovski ( 20051 ) to improve the bound for the violation of con- 
straints. The basic idea is that solving ([5]) can be reduced to the problem of approximating 
a saddle point (x, A) G i3 x [0, oo)'" by solving the associated variational inequality. 
We first define an auxiliary function J-"(x, A) as 



^(x,A) = ^|a.5.(x)-^A2|. 



In order to successfully apply the mirror prox method, we follow the fact that any convex 
domain can be written as an intersection of linear constraints, and make the following 
assumption: 

Assumption 2 We assume that gi{x.),i S [m] are linear, that is, /C = {x G R'^ : gi{'x.) = 
:x.~^ a.i — bi < 0,i € [m]} where a,, G M"^ is a normalized vector with ||aj|| = 1 and 6j G M . 



The following proposition shows that under Assumptions [51 the function J-"(x, A) has Lips- 
chitz continuous gradient, a basis for the application of the mirror prox method. 

Proposition 9 Under Assumption\^ J^(x, A) has Lipschitz continuous gradient, that is, 

||Vx-F(x, A) - Vx'-F(x', A')||^ + ||Va-F(x, A) - Vv-F(x , A')||^ < 2(m + <5\^)(||x - x'f + ||A - X'f) 

Proof 

||Vx-F(x, A) - Vx'-F(x', A')||' + ||Va^(x, A) - Vv^(x', A')||' 



i=l 



+ 



X;a7(x-xO + <5r?X;(A^-A.) 



i=l 



i=l 

i:2^2||\ \/||2 



< \\A ' (A - A')f + 2p(x - x')f + 25VIIA - A 

< 2al^{A)\\^ - x'f + ial^iA) + 2dWm " A'f . 



Since 



(A) = J Amax(^^^) < \/ Tr(AAT) < ^/m. 



we have o"max(^) ^ leading to the desired result. ■ 

Algorithm [2] shows the detailed steps of the mirror prox based algorithm for online convex 
optimization with long term constraints defined in ([5]) . Compared to Algorithm [H there are 
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Algorithm 2 Prox Method with Long Term Constraints 
1: Input: constraints gi{x.) <0,i G [m], step size r/, and constant 5 
2: InitiaUzation: zi = and fii = 
3: for t = 1,2, . . . ,r do 
4: Compute the solution for x^ and Aj as 

xj = Ub (zj - ?7Vx-F(zt, /ij) 
At = n[o, +00)™ 



5: Submit solution xt 

6: Receive the convex function /t(x) and experience loss ft{^t) 

7: Compute £t(x. A) = /i(x) + ^(x. A) = /^(x) + YZi {^i9i{^) " ^A^} 

8: Update z^ and /^^ by 

zt+i = Ub {zt - r]V^Ct{^t, At)) 

fit+i = n[o,+oo)™(Mt + ??VAA(xt,Aj)) 

9: end for 



two key features of Algorithm [2l First, it introduces auxiliary variables zj and /Xj besides 
the variables xt and At. At each iteration t, it first computes the solutions xt and At based 
on the auxiliary variables zt and /x^; it then updates the auxiliary variables based on the 
gradients computed from xt and At. Second, two different functions are used for updating 
(xt, Af) and (zt,/Xt): function J^(x, A) is used for computing the solutions xt and At, while 
function £t(x. A) is used for updating the auxilia ry variables zt and /if 

Our analysis is based on the Lemma 3.1 from iNemirovskil ( 20051 ) which is restated here 
for completeness. 

Lemma 10 Let B(x, x') be a Bregman distance function that has modulus a with respect 
to a norm \\ ■ \\, that is, B(x, x') > q;||x — x'|p/2. Given u £ B, a, and h, we set 

w = arg min a^ (x — u) + B (x, u) , u+ = arg min (x — u) + B (x, u) . 
Then for any x & B and r] > 0, we have 

2 

7?bT(w - x) < B(x, u) - B(x, u+) + ^l|a - h\\l - | [||w - uf + ||w - u+f] . 
We equip B x [0, +00)™ with the norm || • || defined as 

ll(z,/x)f + 

where || • |p is the Euclidean norm defined separately for each domain. It is immediately 
seen that the Bregman distance function defined as 

B(zt,Att>zt+i,/Xt+i) = ^||zt - zt+if + ^ll/^t -/^t+if 
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is a = 1 modules with respect to the norm || • ||. 

To analyze the mirror prox algorithm, we begin with a simple lemma which is the direct 
application of Lemma [TO] when applied to the updating rules of Algorithm [3l 

Lemma 11 If r]{rn + S'^rj^) < | holds, we have 

£t(xt, A) - £t(x, At) 

llx — z/|p — llx — z/j_i IP II A — IP — IIA — IP ,, , , ,,r, 
< ^ + + ^l|V/t(x,)|p. 

Proof To apply Lemma [TOl we define u, w, u+, a and b as follows 
u = (zt,/Xj),u+ = (zt+i,/Xi+i), w = (xt, At), 

a = (Vx-F(zt, Ht), -VxT{zt, /ij), b = (VxA(xt, At), -VA>Ct(xt, At)). 
Using Lemmas [2] and [lOl we have 

||x-zt|p- ||x-zt+i|p ||A-/Xt|p- ||A-/Xt+i|P 



>Ct(xt, A) - £t(x. At 



2?? 27] 
< I { ||Vx-F(zt, /xt) - VM^t, At)f + ||VA-F(zt, Mt) - Vx^ti^t, At)f } 



i{||zt-xt|p + ||/.t-At|P}. 



II 



By expanding the gradient terms and applying the inequality (a + 6)^ < 2(a^ + b"^), we 
upper bound (I) as: 



(I) = ^{2||V/t(xt)|p + 2||Vx-F(zt,/it) - Vx-F(xt, At)|p + \\V x^zt, fJ-t) " VA-F(xt, At)|p} 

< 77|| V/t(xt) IP + 7? {II Vx-F(zt,/^t) - Vx-F(xt, At)|p + ||VA-F(xt, At) - VA-F(xt, At)|p} 

< 77||V/t(xt)|p + 277(m + 6W){ht - xt|p + H/^t " Atf } , (18) 

where the last inequality follows from Proposition [9l Combining (II) with (jlSp results in 

||x- ztiP - ||x- zt+i|p ||A -/itiP - ||A- /^t+ilP 



£t(xt, A) - £t(x. At 



2?? 2?? 



< r/||V/t(xt)||2 + ( 27?(m + 6W) - ^) {\\zt - xtiP + ll/^t - Atll^} 



We complete the proof by rearranging the terms and setting r/(m + 5^r/^) < 4 
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Theorem 12 Set rj = T and 6 = T Let xt,t G [T] be the sequence of solutions 

obtained by Algorithmic Then for T > 164(m + 1)'^ we have 

T T 

Y,ft{-^t)-ft{^*)<0{T^'^) and Y,g,{-Kt)<0{T^/''). 
t=i t=i 

Proof Similar to the proof of Theorem U by summing the bound in Lemma [11] for all 
rounds t = 1, • • • ,T, and taking maximization for A we have the following inequality for 

any G /C, 



T m 

j;[/,(x,)-/t(x.)]+j; 

t=l 



[Er=ifi(xj) 



^ 2{6r]T + m/rf) - 2r] 2 
By setting ^ = ^ and using the fact that Yld=i fti^t) — /t(x*) > —FT we have: 

T 



j:[Mx,)-/,(x)]<|-+fG2 



and 




+ 7]TG^ + FTj . 

Substituting the stated value for t], we get the desired bounds as mentioned in the theorem. 
Note that the condition ri{m + 6^r]^) < | in Lemma [TT] is satisfied for the stated values of 
r] and 6 as long as T > 164(m + 1)'^. ■ 

Using the same trick as Theorem[8l by introducing appropriate 7, we will be able to establish 
the solutions that exactly satisfy the constraints in the long run with an 0{T^/^) regret 
bound as shown in the following corollary. In the case when all the constraints are linear, 
that is, ^^(x) = a^x < bi,i G [m], Assumption 1 is simplified into the following condition, 



mm 



m 
i=l 



> (19) 



where is a m dimensional simplex, that is, = {a G M™ : E™ 1 ctj = 1}. This is 
because ^(x) = maXagA™ Z^i^i and as a result, the (sub)gradient of ^(x) can always 

be written as dg{'K) = "^^i OiVgiix) = X]S=i '^i^i where a G A^. As an illustrative exam- 
ple, consider the case when the norm vectors aj,i G [m] are linearly independent. In this 
case the condition mentioned in (jl9p obviously holds which indicates that the assumption 
does not limit the applicability of the proposed algorithm. 

Corollary 13 Let rj = 5 = T~^/^. Let :Kt,t G [T] be the sequence of solutions obtained 
by Algorithmic with 7 = bT~^^^ and b = 2\/^. With sufficiently large T, that is, FT > 
E?T^^^ _j_ Q2rp2/'i ^ under AssumptionslM ctnd condition in ( fi9)) . we have xt,t G [T] satisfy 
the global constraints Ylt=i 9i(.^t) < 0, i G [m] and the regret yix is bounded by 

= E /.(xO - /,(x.) < f rV3 + ("^ + = oiT^^). 



2 
£=1 
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The proof is similar to that of Theorem [8] and we defer it to Appendix [Bj As indicated 
by Corollary \T3\ for any convex domain defined by a finite number of halfspaces, that is, 
Polyhedral set, one can easily replace the projection onto the Polyhedral set with the ball 
containing the Polyhedral at the price of satisfying the constraints in the long run and 
achieving 0(T^/'^) regret bound. 



5. Online Convex Optimization with Long Term Constraints under 
Bandit Feedback for Domain 

We now turn to extending the gradient based convex-concave optimization algorithm dis- 
cussed in Section [3] to the setting where the learner only receives partial feedback for con- 
straints. More specifically, the exact definition of the domain /C is not exposed to the learner, 
only that the solution is within a ball B. Instead, after receiving a solution x^, the oracle 
will present the learner with the convex loss function /t(x) and the maximum violation of 
the constraints for xt, that is, fi'(xt) = maxjg[^] g(j(xj). We remind that the function g{x.) 
defined in this way is Lipschitz continuous with constant G as proved in Proposition [6j In 
this setting, the convex-concave function defined in ([6]) becomes as 

£t(x,A) = /t(x) + A5(x)-(5??/2)A2. 

The mentioned setting is closely tied to the bandit online convex optimization. In the 
bandit setting, in contrast to the full information setting, only the cost of the chosen de- 
cision (i.e., the incurred loss /t(xj)) is revealed to the algorithm, not the function itself. 
There is a rich body of literature that deals w ith the bandit online convex optimi zation. 
In the seminal papers of Flaxman et al. ( 20051 ) and lAwerbuch and Kleinberg (2004) it has 



been shown that one could design algorithms with 0(T^/^) regret bound even in the ban- 
dit setting where only evaluations of the loss functions are revealed at a single po i nt. I f 



we specialize to the online bandit optimization of linear loss f unctions. iDani et al.l (120071 1 
proposed an inefficient algorithm with 0{^/T) regret bound and Abernethv et ahF i 20081 ) ob- 



tained 0{\/T log T) bound by an efficient algorithm if the convex set admits an efficiently 
computable self-concordant barrier. For general convex loss functions, Agarwal et al.l ( 20ld ) 



proposed optimal algorithms in a new bandit setting, in which multiple points can be queried 
for the cost values. By using multiple evaluations, they showed that the modified online 
gradient descent algorithm can achieve 0{VT) regret bound in expectation. 

Algorithm [3] gives a complete description of the proposed algorithm under the bandit 
setting, which is a slight modification of Algorithm [TJ Algorithm [3] accesses the constraint 
function g{x) at two points. To facilitate the analysis, we define 



A(x,A) = /^(x) + A5(x)- 



^A^ 
2 ' 



where ^(x) is the smoothed version of ^(x) defined as ^(x) = Eve§[^5(x -|- Cv)v] at point 
x^ where S denotes the unit sphere centered at the origin. Note that ^(x) is Lipschitz 
continuous with the same constant G, and it is always differentiable even though g(x) is 
not in our case. 

Since we do not have access to the function g{-) to compute Vx>C(x, A), we need a way 
to estimate its gradient at point x^. Our gradient estimation closely follows the idea in 
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Algorithm 3 Multipoint Bandit Online Convex Optimization with Long Term Constraints 
1: Input: constraint g{x.), step size rj, constant 5 > 0, exploration parameter > 0, and 

shrinkage coefficient 
2: Initialization: xi = and Ai = 
3: for t = 1,2, . . . ,T do 
4: Submit solution x^ 

5: Select unit vector U( uniformly at random 

6: Query g{x.) at points x^ + and x^ — ^Uj and incur average of them as violation of 
constraints 

7: Compute 5x,t = V/t(xj) + At ^(^(xt + Cuj) - g(xt - Cut))ut 

8: Compute gx,t = ^(^(xt + C"*) + ^(xt - C^t)) - 
9: Receive the convex function /t(x) and experience loss fti^t) 
10: Update Xi and Xt by 

xt+i = n(i_^)B {xt - rig^,t) 

Xt+i = n[o_+oo)(At + 



11: end for 



Agarwal et all (|20ld ^ by querying ^(x) function at two points. The main advantage of 
using t wo points to es t imate the gradient with respect to one point gradient estimation 
used in iFlaxman etliD (jiooi) is that the former has a bounded norm which is independent 
of C and leads to improved regret bounds. 

The gradient estimators for Vxi2(xf, At) = V/(xt) + AtV^(xt) and VxOxt, At) = g{xt) — 
5ri\t in Algorithm [3] are computed by evaluating the ^(x) function at two random points 
around xt as 



9x,i = V/t(xt) + At 



^(^(xt + Cut) 



5(xt - Cut))ut 



and 



9\t = -^{gi^t + Cut) + ^(xt - Cut)) 



where Ut is chosen unifo r mly a t random from the surface of the unit sphere. Using Stock's 
theorem, Flaxman et al. ( 20051 ) showed that ^(^(xt + Cut) — ^(xt — Cut))ut is a conditionally 
unbiased estimate of the gradient of ^(x) at point Xf To make sure that randomized points 
around xt live inside the convex domain B, we need to stay away from the boundary of the 



set su ch that the ball of radius C around xt is contained in B. In particular, in lFlaxman et al 
(120051 1 it has been shown that for any x e (1 — $,)B and any unit vector u it holds that 
(x + Cu) £ B as soon as C G [0, S,r]. 

In order to facilitate the analysis of the Algorithm [Sj we define the convex-concave 
function HA--,-) as 



Ht(x, A) = /:t(x. A) + 5x,t - Vx/:(xt, At) X + cjx,t - y\C{xt, At) A. 



(20) 
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It is easy to check that VxH(xj,Af) = ^x,t and V x7i{xt, Xt) = gx^t- By defining func- 
tions T-Lt{x,X), Algorithm [3] reduces to Algorithm [1] by doing gradient descent on functions 
7it{x, A) except the projection is made onto the set (1 — S,)13 instead of B. 

We begin our analysis by reproducing Proposition [3] for functions •). 

Lemma 14 // the Algorithm [1] is performed over convex set JC with functions Tit defined 
in \20\) . then for any x & K. we have 



j2 -Htixt, A) - mix, At) < ^-^^ + V{D' + G')T + ^{d^G^ + ^H^) J] A; 



t=i 



Proof We have VxHtixt, Xt) = gyL,t and V \Ht{xt, Xt) = Qx.t- It is straightforward to show 
that ^{g{xt + C,VLt) — g{xt — C,\it))'VLt has norm bounded by Gd ( Agarwal et all 20ld ). So, the 

norm of gradients are bounded as H^x,*!!! < + d^G^Af ) and H^A.tlli — 2(-D^ + rf'5'^X^). 
Using Lemma [21 by adding for all rounds we get the desired inequality. ■ 

The following theorem gives the regret bound and the expected violation of the constraints 
in the long run for Algorithm [3l 

Theorem 15 Let c = ^/WTG^ {V^R + ^) + + 1)^. Set r] = R/ ^2{D'^ + G'^)T . 
Choose 5 such that 5 > 2(d^G^ + 77^5^). Let ( = ^ and ^ = j.. Letxt,t e [T] be the sequence 
of solutions obtained by Algorithmic We then have 

T 

^ ft{xt) - ft{x) < + cVt = 0{T^/^), and 



t=i 



t=i 



<G5 + 



5R^ + 2{D^ + G'^)\.GD 



) + cVt + FT)Vt = 0{T^'^). 
) r 



Ry/WTW 

Proof Using Lemma[2]for the functions Ct{-, ■) and T-Lt{-, ■) we have 

Ct{xt, A) - Ct{x, Xt) < {xt - x)^VxA(xt, Xt) - (A - Xt)^VxCt{xt, Xt), 

and also 

nt{xt, X) - -Htix, Xt) < {xt - x)'^g^,t - (A - A^^^A,*- 

Subtracting the preceding inequalities, taking expectation, and summing for all t from 1 to 
T we get 



E 



t=i 



^Ct{xt, X) - Ctix, Xi 

T 

^nt{xt,x)-nt{x,Xt) 



E 



t=i 



(21) 



+ E 



{xt - x)T(Vx£t(xt, Xt) - Ma^tA) + (At - x)^iVxCtixt, Xt) - MaxtA) 



t=i 
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Next we provide an upper bound on the difference between the gradients of two functions. 
First, Ef[^x,t] = Vx>C((xt, At), so gx,t is an unbiased estimator of VxA(xt, At). Considering 
the update rule for At+i we have |At+i| < (1 — r/^(5)|At| +t]D which implies that |At| < ^ 
for all t. So we obtain 

{Xt-Xy{VxCt{^t,Xt)-M9Xut]) 
< \Xt- X\Et\\\VxCt{xt,Xt)-gx^,th 



D 
< — 
6i] 



-{g{yit + Cut) + g{^t - C^t)) - gi^t) 



DG ^„ „ DG ^ , , 



where the last inequality follows from Lipschitz property of the functions ^(x) and ^(x) 
with the same constant G. Combining the inequalities (|2ip and (|22p and using Lemma [1 
we have 



E 



[5]£t(xt, A) - £t(x. At)] < + ^(D^ + G2)T + ^{d^G^^ + ^H^) A? + ^T. 



t=\ ' t=\ 

By expanding the right hand side of above inequality, we obtain 

T T T 



(577 



[/t(xt) - /t((l - e)x)] + AE [ J] 5(xt)] - E - e)x)] At - ^A^ + ^ J] A^ 

t=\ t=\ t=\ t=\ 



< + ri{D^ + G^)T + 77(^2^2 + jfS^) ^ A? + ^T. 



By choosing 5 > 2{d?G'^ + ry^^s^ cancel Af terms from both sides and have 



j2 ifti^t) -mi- e)x)] + xE[j2 g{^ -^[m- o^)] E 



At 



t=l 



i?2 + A2 



< 



t=i 

+ ^{D^ + G^)T + ^T. 



t=i 



(23) 



27] ' " ' ' ' 6r] 
By convexity and Lipschitz property of /f(x) and g{x) we have 

/t((i - 6x) < (1 - e)/t(x) + um < /t(x) + DGe, (24) 

g(x) < 5(x) + GC , and ^((1 " ^x) < ^((1 " ^x) + GC < g{^) + GC + DG^. (25) 
Plugging ([24|) and (f25]l back into (f23]l . for any optimal solution x* G /C we get 

T T 



j;[/t(xt)-/t(x)] + AE[j;^y(xt) 



r]5T 



A2 - AGCT 



i=l 



t=l 



< + viD^ + G2)r + + DGCT + (DG^ + GQ V At. 



27] 



(26) 



21 



Mahdavi, Jin and Yang 



Considering the fact that At < ^ we have Ylt=i -^t ^ Plugging back into the ([26|) and 
rearranging the terms we have 



T T 

j;[/t(xt)-/t(x)]+AE[j;5(xt] 



t=i 



R 



t=i 
DGC 



2 ^ 2ri 



<— + V{D' + G')T + -^T + DGiT + {DG^ + GQ 



2r] 



By setting C = r ^^'^ C = y we get 



67] 



DT 
5rj 



5r/ ' 



which gives the mentioned regret bound by optimizing for r/. Maximizing for A over the 
range (0, +00) and using Ylt=i fti'^t) — > —FT, yields the following inequality for 

the violation of constraints 



E 



GQT 



A{5r]T/2 + l/2r]) 



Plugging in the stated values of parameters completes the proof. Note that 6 = AcPG'^ 
obeys the condition specified in the theorem. ■ 



6. Conclusion 

In this study we have addressed the problem of online convex optimization with constraints, 
where we only need the constraints to be satisfied in the long run. In addition to the 
regret bound which is the main tool in analyzing the performance of general online convex 
optimization algorithms, we defined the bound on the violation of constraints in the long 
term which measures the cumulative violation of the solutions from the constraints for all 
rounds. Our setting is applied to solving online convex optimization without projecting the 
solutions onto the complex convex domain at each iteration, which may be computationally 
inefficient for complex domains. Our strategy is to turn the problem into an online convex- 
concave optimization problem and apply online gradient descent algorithm to solve it. We 
have proposed efficient algorithms in three different settings; the violation of constraints is 
allowed, the constraints need to be exactly satisfied, and finally we do not have access to the 
target convex domain except it is bounded by a ball. Moreover, for domains determined by 
linear constraints, we used the mirror prox method, a simple gradient based algorithm for 
variational inequalities, and obtained an 0(T^/^) bound for both regret and the violation 
of the constraints. 

Our work leaves open a number of interesting directions for future work. In particular 
it would be interesting to see if it is possible to improve the bounds obtained in this paper, 
i.e., getting an 0{-\/T) bound on the regret and better bound than 0(T'^/^) on the violation 
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of constraints for general convex domains. Proving optimal lower bounds for the proposed 
setting also remains as an open question. Also, it would be interesting to consider strongly 
convex loss or constraint functions. Finally, relaxing the assumption we made to exactly 
satisfy the constraints in the long run is an interesting problem to be investigated. 
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Appendix A. Proof of Theorem [1] 

We first show that when 6 < 1, there exists a loss function and a constraint function such 
that the violation of constraint is linear in T. To see this, we set /j(x) = w'''x, t G [T] and 
g(x) = 1 — w~'^x. Assume we start with an infeasible solution, that is, g{^i) > or x^w < 1. 
Given the solution xt obtained at tth trial, using the standard gradient descent approach, 
we have x^+i = x^ — ry(l — (5)w. Hence, if xjw < 1, since we have x^^w < Xj'w < 1, if we 
start with an infeasible solution, all the solutions obtained over the trails will violate the 
constraint g{x.) < 0, leading to a linear number of violation of constraints. Based on this 
analysis, we assume 5 > 1 in the analysis below. 

Given a strongly convex loss function /(x) with modulus 7, we consider a constrained 
optimization problem given by 

min f(x), 

gM<o ' 

which is equivalent to the following unconstrained optimization problem 

min/(x) + A[5(x)]+, 

X 

where A > is the Lagrangian multiplier. Since we can always scale /(x) to make A < 1/2, 
it is safe to assume A < 1/2 < 5. Let x* and x^ be the optimal solutions to the constrained 
optimization problems arg ming(x)<o /(^) argmin/(x) + (5[5((x)]+, respectively. We 
choose /(x) such that ||V/(x^,)|| > 0, which leads to x^ 7^ x,,. This holds because according 
to the first order optimality condition, we have 

V/(x,) = -AV5(x*), V/(x«) = -5V5(x*), 

and therefore V/(x^,) / V/(xa) when X < 5. Define A = /(xq) — /(x*). Since A > 
7||xa — x*|p/2 due to the strong convexity of /(x), we have A > 0. 
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Let {x^}^^ be the sequence of solutions generated by the OGD algorithm that minimizes 
the modified loss function /(x) + 5[g(x)]^. We have 

T 

/(xi) + S[g{^t)]+ > rmin/(x) + 5b(x)]+ 

t=l 

= r(/(x,,) + 6[g{^a)]+) > T(/(x,) + A[g(x,,)]+) 

= r(/(x,) + A[5(x,)] + ) + r(/(x,) + X[g{^a)]+ - /(x*) - A[5(x,)]) 

> T min /(x) + TA. 

<?(x)<0 

As a result, we have 

T 

j;/(xO + %(xO]+- min /(x)=0(T), 

9{x)<0 

implying that either the regret Ylt=ifi^t) — ^/(x*) or the violation of the constraints 
X]t=i[5'(x)]+ is linear in T. 

To better understand the performance of penalty based approach, here we analyze the 
performance of the OGD in solving the onli ne optimization problem in ([3|). The algorithm 
is analyzed using the following lemma from IZinkevichI ()2003l ). 

Lemma 16 Let xi,X2, . . . ,xr be the sequence of solutions obtained by applying OGD on 
the sequence of bounded convex functions /i, /2, . . . , /t- Then, for any solution x* G /C we 
have 

^ fti^t) - ^ /,(x.) < ^ + 1 E iiw*(xt)f . 

t=l i=l ^ t=l 

We apply OGD to functions /t(x), t E [T] defined in ([3D, that is, instead of updating 
the solution based on the gradient of /t(x), we update the solution by the gradient of /j(x). 
Using Lemma fTUl by expanding the functions /((x) based on ^ and considering the fact 
that Yl'iLi [5'i(x*)]+ = 0, we get 

T T r. T m r,2 T 

/t(x,) - Y /t(x*) + 1 E E w]+ ^ ^ + i E II v/*(xt)f . (27) 

i=l t=l t=l i=l ' t=l 

From the definition of /t(x), the norm of the gradient V/((xt) is bounded as follows 

m 

||V/i(x)||2 = ||V/<(x) + 5^b.(x)]+V5.(x)f < 2G2(l + m52l)2), (28) 

i=l 

where the inequality holds because (oi + 02)^ < 2(af + o|). By substituting ([28|) into the 
([27j) we have: 

t=l i=l t=l i=l ' 
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Since [-J^ is a convex function, from Jensen's inequality and following the fact that Ylt=i fti^t] 
> —FT, we have: 



—y 

2T ^ 



i=l 



T 



m T 



< ^ E E i9^{^^)]l + m52l)2)r + FT. 



i=l t=l 

By minimizing the right hand side of (j29p with respect to rj, we get the regret bound as 

T T 



^ M^t) - E /*(^*) ^ RGV2il + m5^D^)T = 0{5Vt) 



(30) 



t=l i=l 

and the bound for the violation of constraints as 

T 



(31) 



Examining the bounds obtained in (j30|) and (j31|) . it turns out that in order to recover 
0(\/T) regret bound, we need to set 6 to he & constant, leading to 0{T) bound for the 
violation of constraints in the long run, which is not satisfactory at all. 



Appendix B. Proof of Corollary 

Let be the optimal solution to ming(x)<-7 J2t=i ft{^)- Similar to the proof of Theorem 
[T2l we have 



E - /*(-7)] + 2(5^r + 



- 2?? 2 



Using the stated values for the parameters r] = 6 = T and applying the fact that 
J2j=i fti^t) - ft(x^) > -FT we obtain, 



^/,(x,)-/t(x,)<fTV3 + ^T^/3 
t=i 



and 



(32) 



,t=i 



< 2{R^T^I^ + G^T^/^ + FT)T^'^. 



From Theorem [71 we have the bound 

T T 



G 



E/t(^7)<E/*(^*) + -^^- 



t=i 



t=i 



(33) 



(34) 
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Combining inequalities ()32p and (j34p with substituting the stated value of 7 = hT"^^^ yields 
the regret bound as desired. To obtain the bound for the violation of the constraints, from 
([33|) we have 



For sufficiently large values of T, that is, FT > R^T^/^ + G^T'^/-" we can simplify above 
inequality as Yld=i9{^t) - 2a/FT^/'^ - bT^^^. By setting b = 1\fF the zero bound on the 
violation of constraints is guaranteed. 
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